在医学领域,MRI的地标检测在减少扫描计划,图像登记等中的任务中减少医疗技术人员努力方面发挥着重要作用。首先,88个地标在三个相应的观点中分布在三个相应的观点中 - 矢状,冠状动脉和轴向手动注释,专家临床技术人员的后期准则被划分解剖学,以便更好地定位现有地标,以便即使在斜扫描中也定位重要的地图标志性地标。为了克服有限的数据可用性,我们实施现实的数据增强以生成合成3D容量数据。我们使用修改后的HIGHRES3DNET模型来解决脑MRI容量的地标检测问题。为了在视觉上解释我们的培训模型,并从较弱的模型中辨别更强的模型,我们实现了梯度加权类激活映射(GRAC-CAM),它产生突出显示模型聚焦的区域的粗糙定位图。我们的实验表明,该方法显示出有利的结果,并且整个管道可以扩展到可变数量的地标和其他解剖。
translated by 谷歌翻译
Knowledge Distillation (KD) is a commonly used technique for improving the generalization of compact Pre-trained Language Models (PLMs) on downstream tasks. However, such methods impose the additional burden of training a separate teacher model for every new dataset. Alternatively, one may directly work on the improvement of the optimization procedure of the compact model toward better generalization. Recent works observe that the flatness of the local minimum correlates well with better generalization. In this work, we adapt Stochastic Weight Averaging (SWA), a method encouraging convergence to a flatter minimum, to fine-tuning PLMs. We conduct extensive experiments on various NLP tasks (text classification, question answering, and generation) and different model architectures and demonstrate that our adaptation improves the generalization without extra computation cost. Moreover, we observe that this simple optimization technique is able to outperform the state-of-the-art KD methods for compact models.
translated by 谷歌翻译
风力涡轮机的评级能力提高的推力导致更大的发电机,更长的刀片和更高的塔。目前,风力涡轮机制造商提供了多达16兆瓦的风力涡轮机,在过去五年中,设计能力近60%。这些涡轮机的制造涉及组装巨大的组件。由于设计的频繁变化和涉及的各种任务,因此不可能使其成为劳动密集型的活动。但是,大型组件的处理和组装挑战了人类的能力。本文提出使用移动机器人助手来部分自动化风力涡轮机制造。机器人助手可以降低生产成本和更好的工作条件。该文章介绍了人类操作员有效执行风力涡轮机的机器人助手的开发。该案件来自领先的风力涡轮机制造商。开发的系统还适用于其他大型制造案件,涉及密集型手动工作。
translated by 谷歌翻译
物理信息神经网络(PINN)能够找到给定边界值问题的解决方案。我们使用有限元方法(FEM)的几个想法来增强工程问题中现有的PINN的性能。当前工作的主要贡献是促进使用主要变量的空间梯度作为分离神经网络的输出。后来,具有较高衍生物的强形式应用于主要变量的空间梯度作为物理约束。此外,该问题的所谓能量形式被应用于主要变量,作为训练的附加约束。所提出的方法仅需要一阶导数来构建物理损失函数。我们讨论了为什么通过不同模型之间的各种比较,这一点是有益的。基于配方混合的PINN和FE方法具有一些相似之处。前者利用神经网络的复杂非线性插值将PDE及其能量形式最小化及其能量形式,而后者则在元素节点借助Shape函数在元素节点上使用相同。我们专注于异质固体,以显示深学习在不同边界条件下在复杂环境中预测解决方案的能力。针对FEM的解决方案对两个原型问题的解决方案进行了检查:弹性和泊松方程(稳态扩散问题)。我们得出的结论是,通过正确设计PINN中的网络体系结构,深度学习模型有可能在没有其他来源的任何可用初始数据中解决异质域中的未知数。最后,关于Pinn和FEM的组合进行了讨论,以在未来的开发中快速准确地设计复合材料。
translated by 谷歌翻译
由于人工智能的改进,扬声器识别(SI)技术带来了一个伟大的方向,现在广泛用于各种各样的领域。Si最重要的组件之一是特征提取,对Si过程和性能具有显着影响。结果,彻底研究,对比和分析了许多特征提取策略。本文利用了情绪环境下伪装声音中的发言者识别五个不同的特征提取方法。为了显着评估这项工作,使用了三种效果:高倾斜,低音和电子语音转换(EVC)。实验结果报道称,级联的熔融频率谱系数(MFCCs),MFCCS-DERTA和MFCCS-DELTA-DELTA是最佳特征提取方法。
translated by 谷歌翻译
在静止图像人类行动识别中,现有研究主要利用额外的边界框信息以及类标签来减轻静态图像中的时间信息;但是,使用手动注释准备额外数据是耗时的,也容易出现人类错误。此外,现有研究没有解决与长尾分布的行动识别。在本文中,我们提出了一种用于人类行动认可的两相多方专家分类方法,以通过超级学习和没有任何额外信息应对长尾分布。要为每个超级类别选择最佳配置,并在不同动作类之间表征类间依赖关系,我们提出了一种基于图形的类别选择(GCS)算法。在提出的方法中,粗粒阶段选择最相关的细粒度专家。然后,细粒度专家编码每个超级级别的复杂细节,使得级别的变化增加。在各种公共人类行动识别数据集上进行了广泛的实验评估,包括斯坦福福德40,Pascal VOC行动,Bu101 +和iHar数据集。实验结果表明,该方法产生了有希望的改善。更具体地说,在Ihar,Sanford40,Pascal VOC 2012行动和BU101 +基准中,所提出的方法优于最先进的研究,以8.92%,0.41%,0.66%和2.11%,计算成本远远较低没有任何辅助注释信息。此外,证明,在解决长尾分布的动作识别方面,该方法通过显着的边缘来实现其对应物。
translated by 谷歌翻译
Existing automated techniques for software documentation typically attempt to reason between two main sources of information: code and natural language. However, this reasoning process is often complicated by the lexical gap between more abstract natural language and more structured programming languages. One potential bridge for this gap is the Graphical User Interface (GUI), as GUIs inherently encode salient information about underlying program functionality into rich, pixel-based data representations. This paper offers one of the first comprehensive empirical investigations into the connection between GUIs and functional, natural language descriptions of software. First, we collect, analyze, and open source a large dataset of functional GUI descriptions consisting of 45,998 descriptions for 10,204 screenshots from popular Android applications. The descriptions were obtained from human labelers and underwent several quality control mechanisms. To gain insight into the representational potential of GUIs, we investigate the ability of four Neural Image Captioning models to predict natural language descriptions of varying granularity when provided a screenshot as input. We evaluate these models quantitatively, using common machine translation metrics, and qualitatively through a large-scale user study. Finally, we offer learned lessons and a discussion of the potential shown by multimodal models to enhance future techniques for automated software documentation.
translated by 谷歌翻译
While the capabilities of autonomous systems have been steadily improving in recent years, these systems still struggle to rapidly explore previously unknown environments without the aid of GPS-assisted navigation. The DARPA Subterranean (SubT) Challenge aimed to fast track the development of autonomous exploration systems by evaluating their performance in real-world underground search-and-rescue scenarios. Subterranean environments present a plethora of challenges for robotic systems, such as limited communications, complex topology, visually-degraded sensing, and harsh terrain. The presented solution enables long-term autonomy with minimal human supervision by combining a powerful and independent single-agent autonomy stack, with higher level mission management operating over a flexible mesh network. The autonomy suite deployed on quadruped and wheeled robots was fully independent, freeing the human supervision to loosely supervise the mission and make high-impact strategic decisions. We also discuss lessons learned from fielding our system at the SubT Final Event, relating to vehicle versatility, system adaptability, and re-configurable communications.
translated by 谷歌翻译
In this paper, we reduce the complexity of approximating the correlation clustering problem from $O(m\times\left( 2+ \alpha (G) \right)+n)$ to $O(m+n)$ for any given value of $\varepsilon$ for a complete signed graph with $n$ vertices and $m$ positive edges where $\alpha(G)$ is the arboricity of the graph. Our approach gives the same output as the original algorithm and makes it possible to implement the algorithm in a full dynamic setting where edge sign flipping and vertex addition/removal are allowed. Constructing this index costs $O(m)$ memory and $O(m\times\alpha(G))$ time. We also studied the structural properties of the non-agreement measure used in the approximation algorithm. The theoretical results are accompanied by a full set of experiments concerning seven real-world graphs. These results shows superiority of our index-based algorithm to the non-index one by a decrease of %34 in time on average.
translated by 谷歌翻译
This paper proposes a novel self-supervised based Cut-and-Paste GAN to perform foreground object segmentation and generate realistic composite images without manual annotations. We accomplish this goal by a simple yet effective self-supervised approach coupled with the U-Net based discriminator. The proposed method extends the ability of the standard discriminators to learn not only the global data representations via classification (real/fake) but also learn semantic and structural information through pseudo labels created using the self-supervised task. The proposed method empowers the generator to create meaningful masks by forcing it to learn informative per-pixel as well as global image feedback from the discriminator. Our experiments demonstrate that our proposed method significantly outperforms the state-of-the-art methods on the standard benchmark datasets.
translated by 谷歌翻译